The ACM digital library (https://dl.acm.org) is arguably the most important
collection of computer science literature in existence. It covers almost the
entire history of the field and includes many of the most influential papers
dating from the 1950s to today.

This archive contains a total of 487,386 full-text articles, totaling 519 GiB.
It was compiled in June 2020 based on the subset of articles in the digital
library which have full-text PDF versions available.

All content and metadata in the archive was obtained entirely from publicly-
accessible parts of the digital library that ACM had explicitly stated were
available for free access and download.

The articles are organized into a series of .zip files, each containing 1,000
PDF files named based on the article's DOI (Digital Object Identifier). The
included sqlite database contains all relevant metadata including title,
authors, publication date, and the journal/conference in which the article
appeared. Abstracts are also included where available. Consult the database
schema for details about how the metadata is organized.

A sample Python program is included that serves as a demonstration of how the
database can be used, and as a primitive but functional means of browsing the
collection. It exposes a local HTTP server providing a web interface to browse
the available issues and articles, including viewing the PDF documents
contained in the zip files.

To run the program, ensure you have Python 3.6 or later installed. Then run the
command below and go to http://127.0.0.1:11113 in your browser.

python3 viewer.py

This program utilizes Python's built-in zipfile module to serve up PDF files
directly to your browser without requiring the full zip file containing the PDF
to be extracted. Similar libraries are available for other languages. If so
inclined, you are encouraged to build alternative frontends with richer
features such as search functionality for titles/abstracts or even full-text
search on the PDFs themselves.

Note: The articles table in the database contains a total of 531,370 entries.
Of these, 43,984 do not have any associated PDF files. This discrepancy is due
to two factors: 1) many conference proceedings include entries like section
titles and sessions/workshops for which there is no actual paper, and 2) some
of the items simply did not have a PDF version (or any full-text version)
available in the digital library.
